Experiments with Arabic Topic Detection

نویسندگان

  • RIM KOULALI
  • ABDELOUAFI MEZIANE
چکیده

The continuous growth of information on the Internet and the availability of a large mass of electronic documents in Arabic language make Natural Language processing (NLP) tasks play an important role to enhance and facilitate the access and the exploitation of information. Among available NLP tasks, we are interested in Arabic Topic Detection. Our objective is to realize an indexing system capable of identifying the general topics discussed in Arabic unvowelized documents. The proposed topic detection system of Arabic texts is based on Mutual Information for Topic Oriented Vocabulary (TOV) and classification according to Jaccard and adapted TF-IDF indicators. The experimental results are presented in terms of precision, recall and F1 measure evaluating the influence of factors such as: vocabulary length and morphological analysis on Arabic Topic Detection.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Traffic Scene Analysis using Hierarchical Sparse Topical Coding

Analyzing motion patterns in traffic videos can be exploited directly to generate high-level descriptions of the video contents. Such descriptions may further be employed in different traffic applications such as traffic phase detection and abnormal event detection. One of the most recent and successful unsupervised methods for complex traffic scene analysis is based on topic models. In this pa...

متن کامل

Using Machine Learning Algorithms for Automatic Cyber Bullying Detection in Arabic Social Media

Social media allows people interact to express their thoughts or feelings about different subjects. However, some of users may write offensive twits to other via social media which known as cyber bullying. Successful prevention depends on automatically detecting malicious messages. Automatic detection of bullying in the text of social media by analyzing the text "twits" via one of the machine l...

متن کامل

TDT-2002 Topic Tracking at Maryland: First Experiments with the Lemur Toolkit

The University of Maryland submitted six topic tracking runs for the 2002 Topic Detection and Tracking evaluation. Two runs were produced using the Lemur language modeling toolkit, the remaining four were produced using an separate system coded in Perl. The Lemur runs outperformed the Perl runs on the required condition because term frequency information was better handled. Two of the Perl runs...

متن کامل

Mani’s Living Gospel: A New Approach to the Arabic and Classical New Persian Testimonia

In order to reconstruct the contents of the most famous work of Mani, Living Gospel (written originally in Syriac), we have to use the Arabic and Classical New Persian texts containing accounts and even indirect quotations of this book. One of the most remarkable points in these accounts is that they clearly show that an important part of the Living Gospel contains the Manicha...

متن کامل

Evaluation of Topic Identification Methods on Arabic Corpora

Topic Identification is one of the important keys for the success of many applications. Indeed, there are few works in this field concerning Arabic language because of lack of standard corpora. In this study, we will provide directly comparable results of six text categorization methods on a new Arabic corpus Alwatan-2004. Hence, Topic Unigram Language Model (TULM), Term Frequency/Inverse Docum...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013